Overview

Dataset statistics

Number of variables27
Number of observations396030
Missing cells81592
Missing cells (%)0.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory427.5 MiB
Average record size in memory1.1 KiB

Variable types

CAT15
NUM12

Warnings

emp_title has a high cardinality: 173104 distinct values High cardinality
issue_d has a high cardinality: 115 distinct values High cardinality
title has a high cardinality: 48817 distinct values High cardinality
earliest_cr_line has a high cardinality: 684 distinct values High cardinality
address has a high cardinality: 393700 distinct values High cardinality
installment is highly correlated with loan_amntHigh correlation
loan_amnt is highly correlated with installmentHigh correlation
sub_grade is highly correlated with gradeHigh correlation
grade is highly correlated with sub_gradeHigh correlation
emp_title has 22930 (5.8%) missing values Missing
emp_length has 18301 (4.6%) missing values Missing
mort_acc has 37795 (9.5%) missing values Missing
annual_inc is highly skewed (γ1 = 41.04272475) Skewed
dti is highly skewed (γ1 = 431.0512254) Skewed
address is uniformly distributed Uniform
pub_rec has 338272 (85.4%) zeros Zeros
mort_acc has 139777 (35.3%) zeros Zeros
pub_rec_bankruptcies has 350380 (88.5%) zeros Zeros

Reproduction

Analysis started2020-12-15 05:21:27.670941
Analysis finished2020-12-15 05:23:10.407905
Duration1 minute and 42.74 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

loan_amnt
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1397
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14113.88809
Minimum500
Maximum40000
Zeros0
Zeros (%)0.0%
Memory size3.0 MiB

Quantile statistics

Minimum500
5-th percentile3250
Q18000
median12000
Q320000
95-th percentile30975
Maximum40000
Range39500
Interquartile range (IQR)12000

Descriptive statistics

Standard deviation8357.441341
Coefficient of variation (CV)0.5921430926
Kurtosis-0.06259753499
Mean14113.88809
Median Absolute Deviation (MAD)5500
Skewness0.7772854671
Sum5589523100
Variance69846825.77
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
10000276687.0%
 
12000213665.4%
 
15000199035.0%
 
20000189694.8%
 
35000145763.7%
 
8000135393.4%
 
6000127343.2%
 
5000124433.1%
 
16000101292.6%
 
1800091952.3%
 
2500090672.3%
 
2400086842.2%
 
3000068601.7%
 
700067441.7%
 
1400059631.5%
 
2800055171.4%
 
900054911.4%
 
400054001.4%
 
2100050901.3%
 
300048881.2%
 
1300033770.9%
 
960033280.8%
 
720032190.8%
 
1100032000.8%
 
200026470.7%
 
Other values (1372)15603339.4%
 
ValueCountFrequency (%) 
5004< 0.1%
 
7001< 0.1%
 
7251< 0.1%
 
7501< 0.1%
 
8001< 0.1%
 
9001< 0.1%
 
9501< 0.1%
 
100014480.4%
 
10254< 0.1%
 
105010< 0.1%
 
ValueCountFrequency (%) 
40000180< 0.1%
 
397001< 0.1%
 
396001< 0.1%
 
395001< 0.1%
 
394751< 0.1%
 
392001< 0.1%
 
388251< 0.1%
 
387501< 0.1%
 
384751< 0.1%
 
383001< 0.1%
 

term
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
36 months
302005 
60 months
94025 
ValueCountFrequency (%) 
36 months30200576.3%
 
60 months9402523.7%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length10
Median length10
Mean length10
Min length10

Overview of Unicode Properties

Unique unicode characters10
Unique unicode categories3 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
79206020.0%
 
639603010.0%
 
m39603010.0%
 
o39603010.0%
 
n39603010.0%
 
t39603010.0%
 
h39603010.0%
 
s39603010.0%
 
33020057.6%
 
0940252.4%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter237618060.0%
 
Space Separator79206020.0%
 
Decimal Number79206020.0%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
792060100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
639603050.0%
 
330200538.1%
 
09402511.9%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
m39603016.7%
 
o39603016.7%
 
n39603016.7%
 
t39603016.7%
 
h39603016.7%
 
s39603016.7%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin237618060.0%
 
Common158412040.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
79206050.0%
 
639603025.0%
 
330200519.1%
 
0940255.9%
 

Most frequent Latin characters

ValueCountFrequency (%) 
m39603016.7%
 
o39603016.7%
 
n39603016.7%
 
t39603016.7%
 
h39603016.7%
 
s39603016.7%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII3960300100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
79206020.0%
 
639603010.0%
 
m39603010.0%
 
o39603010.0%
 
n39603010.0%
 
t39603010.0%
 
h39603010.0%
 
s39603010.0%
 
33020057.6%
 
0940252.4%
 

int_rate
Real number (ℝ≥0)

Distinct566
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.63940005
Minimum5.32
Maximum30.99
Zeros0
Zeros (%)0.0%
Memory size3.0 MiB

Quantile statistics

Minimum5.32
5-th percentile6.89
Q110.49
median13.33
Q316.49
95-th percentile21.97
Maximum30.99
Range25.67
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.472157382
Coefficient of variation (CV)0.3278851978
Kurtosis-0.1439465381
Mean13.63940005
Median Absolute Deviation (MAD)3.08
Skewness0.420669472
Sum5401611.6
Variance20.00019165
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
10.99124113.1%
 
12.9996322.4%
 
15.6193502.4%
 
11.9985822.2%
 
8.980192.0%
 
12.1273581.9%
 
7.973321.9%
 
16.2966321.7%
 
13.1165801.7%
 
6.0362911.6%
 
17.5762121.6%
 
15.3161101.5%
 
9.1761081.5%
 
13.9957221.4%
 
14.3356701.4%
 
16.9956441.4%
 
18.2552531.3%
 
9.9952481.3%
 
11.1452401.3%
 
7.6248391.2%
 
12.6947871.2%
 
13.9845861.2%
 
14.6543251.1%
 
12.4942071.1%
 
7.8941931.1%
 
Other values (541)23569959.5%
 
ValueCountFrequency (%) 
5.3224400.6%
 
5.424650.1%
 
5.793330.1%
 
5.934310.1%
 
5.992780.1%
 
670< 0.1%
 
6.0362911.6%
 
6.172200.1%
 
6.2411840.3%
 
6.396560.2%
 
ValueCountFrequency (%) 
30.9913< 0.1%
 
30.943< 0.1%
 
30.893< 0.1%
 
30.841< 0.1%
 
30.799< 0.1%
 
30.744< 0.1%
 
30.495< 0.1%
 
29.997< 0.1%
 
29.968< 0.1%
 
29.6715< 0.1%
 

installment
Real number (ℝ≥0)

HIGH CORRELATION

Distinct55706
Distinct (%)14.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean431.849698
Minimum16.08
Maximum1533.81
Zeros0
Zeros (%)0.0%
Memory size3.0 MiB

Quantile statistics

Minimum16.08
5-th percentile109.51
Q1250.33
median375.43
Q3567.3
95-th percentile925.6
Maximum1533.81
Range1517.73
Interquartile range (IQR)316.97

Descriptive statistics

Standard deviation250.7277895
Coefficient of variation (CV)0.5805904014
Kurtosis0.7838199213
Mean431.849698
Median Absolute Deviation (MAD)150.5
Skewness0.9835981609
Sum171025435.9
Variance62864.42443
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
327.349680.2%
 
332.17910.2%
 
491.017360.2%
 
336.96860.2%
 
392.816830.2%
 
332.726410.2%
 
337.476240.2%
 
317.545740.1%
 
654.685560.1%
 
261.885270.1%
 
196.415250.1%
 
399.265230.1%
 
498.155230.1%
 
318.795140.1%
 
163.675000.1%
 
635.075000.1%
 
381.044910.1%
 
625.814880.1%
 
304.364840.1%
 
312.914660.1%
 
328.064620.1%
 
476.34550.1%
 
348.184550.1%
 
343.394470.1%
 
398.524470.1%
 
Other values (55681)38196496.4%
 
ValueCountFrequency (%) 
16.081< 0.1%
 
16.251< 0.1%
 
16.311< 0.1%
 
16.471< 0.1%
 
19.871< 0.1%
 
20.221< 0.1%
 
21.251< 0.1%
 
21.621< 0.1%
 
21.991< 0.1%
 
22.241< 0.1%
 
ValueCountFrequency (%) 
1533.811< 0.1%
 
15271< 0.1%
 
1503.851< 0.1%
 
1479.491< 0.1%
 
1464.421< 0.1%
 
1458.251< 0.1%
 
1451.142< 0.1%
 
1451.122< 0.1%
 
1445.91< 0.1%
 
1443.761< 0.1%
 

grade
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
B
116018 
C
105987 
A
64187 
D
63524 
E
31488 
Other values (2)
14826 
ValueCountFrequency (%) 
B11601829.3%
 
C10598726.8%
 
A6418716.2%
 
D6352416.0%
 
E314888.0%
 
F117723.0%
 
G30540.8%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters7
Unique unicode categories1 ?
Unique unicode scripts1 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
B11601829.3%
 
C10598726.8%
 
A6418716.2%
 
D6352416.0%
 
E314888.0%
 
F117723.0%
 
G30540.8%
 

Most occurring categories

ValueCountFrequency (%) 
Uppercase Letter396030100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
B11601829.3%
 
C10598726.8%
 
A6418716.2%
 
D6352416.0%
 
E314888.0%
 
F117723.0%
 
G30540.8%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin396030100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
B11601829.3%
 
C10598726.8%
 
A6418716.2%
 
D6352416.0%
 
E314888.0%
 
F117723.0%
 
G30540.8%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII396030100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
B11601829.3%
 
C10598726.8%
 
A6418716.2%
 
D6352416.0%
 
E314888.0%
 
F117723.0%
 
G30540.8%
 

sub_grade
Categorical

HIGH CORRELATION

Distinct35
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
B3
 
26655
B4
 
25601
C1
 
23662
C2
 
22580
B2
 
22495
Other values (30)
275037 
ValueCountFrequency (%) 
B3266556.7%
 
B4256016.5%
 
C1236626.0%
 
C2225805.7%
 
B2224955.7%
 
B5220855.6%
 
C3212215.4%
 
C4202805.1%
 
B1191824.8%
 
A5185264.7%
 
C5182444.6%
 
D1159934.0%
 
A4157894.0%
 
D2139513.5%
 
D3122233.1%
 
D4116572.9%
 
A3105762.7%
 
A197292.5%
 
D597002.4%
 
A295672.4%
 
E179172.0%
 
E274311.9%
 
E362071.6%
 
E453611.4%
 
E545721.2%
 
Other values (10)148263.7%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length2
Median length2
Mean length2
Min length2

Overview of Unicode Properties

Unique unicode characters12
Unique unicode categories2 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
B11601814.6%
 
C10598713.4%
 
18107710.2%
 
48084910.2%
 
37972010.1%
 
27954410.0%
 
5748409.4%
 
A641878.1%
 
D635248.0%
 
E314884.0%
 
F117721.5%
 
G30540.4%
 

Most occurring categories

ValueCountFrequency (%) 
Uppercase Letter39603050.0%
 
Decimal Number39603050.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
B11601829.3%
 
C10598726.8%
 
A6418716.2%
 
D6352416.0%
 
E314888.0%
 
F117723.0%
 
G30540.8%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
18107720.5%
 
48084920.4%
 
37972020.1%
 
27954420.1%
 
57484018.9%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin39603050.0%
 
Common39603050.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
B11601829.3%
 
C10598726.8%
 
A6418716.2%
 
D6352416.0%
 
E314888.0%
 
F117723.0%
 
G30540.8%
 

Most frequent Common characters

ValueCountFrequency (%) 
18107720.5%
 
48084920.4%
 
37972020.1%
 
27954420.1%
 
57484018.9%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII792060100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
B11601814.6%
 
C10598713.4%
 
18107710.2%
 
48084910.2%
 
37972010.1%
 
27954410.0%
 
5748409.4%
 
A641878.1%
 
D635248.0%
 
E314884.0%
 
F117721.5%
 
G30540.4%
 

emp_title
Categorical

HIGH CARDINALITY
MISSING

Distinct173104
Distinct (%)46.4%
Missing22930
Missing (%)5.8%
Memory size3.0 MiB
Teacher
 
4389
Manager
 
4250
Registered Nurse
 
1856
RN
 
1846
Supervisor
 
1830
Other values (173099)
358929 
ValueCountFrequency (%) 
Teacher43891.1%
 
Manager42501.1%
 
Registered Nurse18560.5%
 
RN18460.5%
 
Supervisor18300.5%
 
Sales16380.4%
 
Project Manager15050.4%
 
Owner14100.4%
 
Driver13390.3%
 
Office Manager12180.3%
 
manager11450.3%
 
Director10890.3%
 
General Manager10740.3%
 
Engineer9950.3%
 
teacher9620.2%
 
driver8820.2%
 
Vice President8570.2%
 
Operations Manager7630.2%
 
Administrative Assistant7560.2%
 
Accountant7480.2%
 
President7420.2%
 
owner6970.2%
 
Account Manager6920.2%
 
Police Officer6860.2%
 
supervisor6730.2%
 
Other values (173079)33905885.6%
 
(Missing)229305.8%
 
Frequencies of value counts

Unique

Unique145247 ?
Unique (%)38.9%
Histogram of lengths of the category

Length

Max length78
Median length15
Mean length15.80017928
Min length1

Overview of Unicode Properties

Unique unicode characters125
Unique unicode categories17 ?
Unique unicode scripts2 ?
Unique unicode blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e6062069.7%
 
4878367.8%
 
a4783117.6%
 
r4704497.5%
 
n4510627.2%
 
i4060946.5%
 
t3734576.0%
 
o3309755.3%
 
s2939454.7%
 
c2441753.9%
 
l2000563.2%
 
u1200351.9%
 
g1187151.9%
 
S1154801.8%
 
d997321.6%
 
p989511.6%
 
m980041.6%
 
C937031.5%
 
h882471.4%
 
A877171.4%
 
M731501.2%
 
y686791.1%
 
f663561.1%
 
v594771.0%
 
P576880.9%
 
Other values (100)66884510.7%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter477012976.2%
 
Uppercase Letter93990515.0%
 
Space Separator4878397.8%
 
Other Punctuation456870.7%
 
Decimal Number63830.1%
 
Dash Punctuation55410.1%
 
Open Punctuation847< 0.1%
 
Close Punctuation821< 0.1%
 
Math Symbol114< 0.1%
 
Control30< 0.1%
 
Modifier Symbol18< 0.1%
 
Currency Symbol11< 0.1%
 
Other Symbol7< 0.1%
 
Connector Punctuation6< 0.1%
 
Other Number4< 0.1%
 
Format2< 0.1%
 
Final Punctuation1< 0.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
S11548012.3%
 
C9370310.0%
 
A877179.3%
 
M731507.8%
 
P576886.1%
 
T543715.8%
 
E502415.3%
 
I491835.2%
 
R484385.2%
 
D457844.9%
 
O364523.9%
 
L341603.6%
 
N338253.6%
 
B270172.9%
 
F251222.7%
 
H249542.7%
 
G196742.1%
 
U177981.9%
 
W129621.4%
 
V124551.3%
 
K54500.6%
 
J50670.5%
 
Y47920.5%
 
Q27280.3%
 
X10010.1%
 
Other values (5)6930.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e60620612.7%
 
a47831110.0%
 
r4704499.9%
 
n4510629.5%
 
i4060948.5%
 
t3734577.8%
 
o3309756.9%
 
s2939456.2%
 
c2441755.1%
 
l2000564.2%
 
u1200352.5%
 
g1187152.5%
 
d997322.1%
 
p989512.1%
 
m980042.1%
 
h882471.8%
 
y686791.4%
 
f663561.4%
 
v594771.2%
 
k310790.7%
 
b234520.5%
 
w223660.5%
 
x95610.2%
 
j56640.1%
 
z30240.1%
 
Other values (8)2057< 0.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
487836> 99.9%
 
 3< 0.1%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.1874441.0%
 
,979021.4%
 
/792117.3%
 
&635213.9%
 
'25235.5%
 
#1400.3%
 
;450.1%
 
:430.1%
 
!310.1%
 
"280.1%
 
\260.1%
 
@18< 0.1%
 
*16< 0.1%
 
%4< 0.1%
 
?3< 0.1%
 
¡2< 0.1%
 
1< 0.1%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-5541100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
1142822.4%
 
2130320.4%
 
3100215.7%
 
45488.6%
 
04356.8%
 
54096.4%
 
63856.0%
 
93355.2%
 
73215.0%
 
82173.4%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(83999.1%
 
[70.8%
 
{10.1%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
)81699.4%
 
]40.5%
 
}10.1%
 

Most frequent Math Symbol characters

ValueCountFrequency (%) 
+9280.7%
 
|1614.0%
 
~43.5%
 
¬10.9%
 
<10.9%
 

Most frequent Connector Punctuation characters

ValueCountFrequency (%) 
_6100.0%
 

Most frequent Control characters

ValueCountFrequency (%) 
€826.7%
 
ƒ723.3%
 
™310.0%
 
’26.7%
 
‚26.7%
 
š26.7%
 
26.7%
 
†13.3%
 
…13.3%
 
œ13.3%
 
“13.3%
 

Most frequent Other Number characters

ValueCountFrequency (%) 
²375.0%
 
³125.0%
 

Most frequent Currency Symbol characters

ValueCountFrequency (%) 
$872.7%
 
¢327.3%
 

Most frequent Other Symbol characters

ValueCountFrequency (%) 
©7100.0%
 

Most frequent Final Punctuation characters

ValueCountFrequency (%) 
1100.0%
 

Most frequent Format characters

ValueCountFrequency (%) 
­150.0%
 
150.0%
 

Most frequent Modifier Symbol characters

ValueCountFrequency (%) 
`18100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin571003491.3%
 
Common5473118.7%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e60620610.6%
 
a4783118.4%
 
r4704498.2%
 
n4510627.9%
 
i4060947.1%
 
t3734576.5%
 
o3309755.8%
 
s2939455.1%
 
c2441754.3%
 
l2000563.5%
 
u1200352.1%
 
g1187152.1%
 
S1154802.0%
 
d997321.7%
 
p989511.7%
 
m980041.7%
 
C937031.6%
 
h882471.5%
 
A877171.5%
 
M731501.3%
 
y686791.2%
 
f663561.2%
 
v594771.0%
 
P576881.0%
 
T543711.0%
 
Other values (38)5549999.7%
 

Most frequent Common characters

ValueCountFrequency (%) 
48783689.1%
 
.187443.4%
 
,97901.8%
 
/79211.4%
 
&63521.2%
 
-55411.0%
 
'25230.5%
 
114280.3%
 
213030.2%
 
310020.2%
 
(8390.2%
 
)8160.1%
 
45480.1%
 
04350.1%
 
54090.1%
 
63850.1%
 
93350.1%
 
73210.1%
 
8217< 0.1%
 
#140< 0.1%
 
+92< 0.1%
 
;45< 0.1%
 
:43< 0.1%
 
!31< 0.1%
 
"28< 0.1%
 
Other values (37)187< 0.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII6257240> 99.9%
 
None103< 0.1%
 
Punctuation2< 0.1%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e6062069.7%
 
4878367.8%
 
a4783117.6%
 
r4704497.5%
 
n4510627.2%
 
i4060946.5%
 
t3734576.0%
 
o3309755.3%
 
s2939454.7%
 
c2441753.9%
 
l2000563.2%
 
u1200351.9%
 
g1187151.9%
 
S1154801.8%
 
d997321.6%
 
p989511.6%
 
m980041.6%
 
C937031.5%
 
h882471.4%
 
A877171.4%
 
M731501.2%
 
y686791.1%
 
f663561.1%
 
v594771.0%
 
P576880.9%
 
Other values (68)66874010.7%
 

Most frequent None characters

ValueCountFrequency (%) 
Ã2120.4%
 
Â109.7%
 
â87.8%
 
€87.8%
 
ƒ76.8%
 
©76.8%
 
é43.9%
 
²32.9%
 
¢32.9%
 
 32.9%
 
™32.9%
 
¡21.9%
 
á21.9%
 
Æ21.9%
 
’21.9%
 
‚21.9%
 
š21.9%
 
ñ21.9%
 
­11.0%
 
11.0%
 
†11.0%
 
¬11.0%
 
…11.0%
 
œ11.0%
 
í11.0%
 
Other values (5)54.9%
 

Most frequent Punctuation characters

ValueCountFrequency (%) 
150.0%
 
150.0%
 

emp_length
Categorical

MISSING

Distinct11
Distinct (%)< 0.1%
Missing18301
Missing (%)4.6%
Memory size3.0 MiB
10+ years
126041 
2 years
35827 
< 1 year
31725 
3 years
31665 
5 years
26495 
Other values (6)
125976 
ValueCountFrequency (%) 
10+ years12604131.8%
 
2 years358279.0%
 
< 1 year317258.0%
 
3 years316658.0%
 
5 years264956.7%
 
1 year258826.5%
 
4 years239526.0%
 
6 years208415.3%
 
7 years208195.3%
 
8 years191684.8%
 
9 years153143.9%
 
(Missing)183014.6%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length9
Median length7
Mean length7.466431836
Min length3

Overview of Unicode Properties

Unique unicode characters19
Unique unicode categories4 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
40945413.8%
 
a39603013.4%
 
y37772912.8%
 
e37772912.8%
 
r37772912.8%
 
s32012210.8%
 
11836486.2%
 
01260414.3%
 
+1260414.3%
 
n366021.2%
 
2358271.2%
 
<317251.1%
 
3316651.1%
 
5264950.9%
 
4239520.8%
 
6208410.7%
 
7208190.7%
 
8191680.6%
 
9153140.5%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter188594163.8%
 
Decimal Number50377017.0%
 
Space Separator40945413.8%
 
Math Symbol1577665.3%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
118364836.5%
 
012604125.0%
 
2358277.1%
 
3316656.3%
 
5264955.3%
 
4239524.8%
 
6208414.1%
 
7208194.1%
 
8191683.8%
 
9153143.0%
 

Most frequent Math Symbol characters

ValueCountFrequency (%) 
+12604179.9%
 
<3172520.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
409454100.0%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
a39603021.0%
 
y37772920.0%
 
e37772920.0%
 
r37772920.0%
 
s32012217.0%
 
n366021.9%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin188594163.8%
 
Common107099036.2%
 

Most frequent Common characters

ValueCountFrequency (%) 
40945438.2%
 
118364817.1%
 
012604111.8%
 
+12604111.8%
 
2358273.3%
 
<317253.0%
 
3316653.0%
 
5264952.5%
 
4239522.2%
 
6208411.9%
 
7208191.9%
 
8191681.8%
 
9153141.4%
 

Most frequent Latin characters

ValueCountFrequency (%) 
a39603021.0%
 
y37772920.0%
 
e37772920.0%
 
r37772920.0%
 
s32012217.0%
 
n366021.9%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII2956931100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
40945413.8%
 
a39603013.4%
 
y37772912.8%
 
e37772912.8%
 
r37772912.8%
 
s32012210.8%
 
11836486.2%
 
01260414.3%
 
+1260414.3%
 
n366021.2%
 
2358271.2%
 
<317251.1%
 
3316651.1%
 
5264950.9%
 
4239520.8%
 
6208410.7%
 
7208190.7%
 
8191680.6%
 
9153140.5%
 

home_ownership
Categorical

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
MORTGAGE
198348 
RENT
159790 
OWN
37746 
OTHER
 
112
NONE
 
31
ValueCountFrequency (%) 
MORTGAGE19834850.1%
 
RENT15979040.3%
 
OWN377469.5%
 
OTHER112< 0.1%
 
NONE31< 0.1%
 
ANY3< 0.1%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length8
Median length8
Mean length5.908327652
Min length3

Overview of Unicode Properties

Unique unicode characters11
Unique unicode categories1 ?
Unique unicode scripts1 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
G39669617.0%
 
E35828115.3%
 
R35825015.3%
 
T35825015.3%
 
O23623710.1%
 
A1983518.5%
 
M1983488.5%
 
N1976018.4%
 
W377461.6%
 
H112< 0.1%
 
Y3< 0.1%
 

Most occurring categories

ValueCountFrequency (%) 
Uppercase Letter2339875100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
G39669617.0%
 
E35828115.3%
 
R35825015.3%
 
T35825015.3%
 
O23623710.1%
 
A1983518.5%
 
M1983488.5%
 
N1976018.4%
 
W377461.6%
 
H112< 0.1%
 
Y3< 0.1%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin2339875100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
G39669617.0%
 
E35828115.3%
 
R35825015.3%
 
T35825015.3%
 
O23623710.1%
 
A1983518.5%
 
M1983488.5%
 
N1976018.4%
 
W377461.6%
 
H112< 0.1%
 
Y3< 0.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII2339875100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
G39669617.0%
 
E35828115.3%
 
R35825015.3%
 
T35825015.3%
 
O23623710.1%
 
A1983518.5%
 
M1983488.5%
 
N1976018.4%
 
W377461.6%
 
H112< 0.1%
 
Y3< 0.1%
 

annual_inc
Real number (ℝ≥0)

SKEWED

Distinct27197
Distinct (%)6.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean74203.1758
Minimum0
Maximum8706582
Zeros1
Zeros (%)< 0.1%
Memory size3.0 MiB

Quantile statistics

Minimum0
5-th percentile28000
Q145000
median64000
Q390000
95-th percentile150000
Maximum8706582
Range8706582
Interquartile range (IQR)45000

Descriptive statistics

Standard deviation61637.62116
Coefficient of variation (CV)0.8306601503
Kurtosis4238.550572
Mean74203.1758
Median Absolute Deviation (MAD)21000
Skewness41.04272475
Sum2.938668371e+10
Variance3799196342
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
60000153133.9%
 
50000133033.4%
 
65000113332.9%
 
70000106742.7%
 
40000106292.7%
 
45000101142.6%
 
8000099712.5%
 
7500098502.5%
 
5500091952.3%
 
9000075731.9%
 
10000074801.9%
 
8500069361.8%
 
3500065441.7%
 
3000062501.6%
 
12000057671.5%
 
5200053161.3%
 
4200052961.3%
 
4800050481.3%
 
11000048701.2%
 
7200043691.1%
 
9500041001.0%
 
3600036660.9%
 
15000034720.9%
 
6200034340.9%
 
3800032040.8%
 
Other values (27172)21232353.6%
 
ValueCountFrequency (%) 
01< 0.1%
 
6001< 0.1%
 
25001< 0.1%
 
40002< 0.1%
 
40801< 0.1%
 
42001< 0.1%
 
45241< 0.1%
 
48006< 0.1%
 
48881< 0.1%
 
50003< 0.1%
 
ValueCountFrequency (%) 
87065821< 0.1%
 
76000001< 0.1%
 
74463951< 0.1%
 
71417781< 0.1%
 
70000001< 0.1%
 
65000001< 0.1%
 
61000001< 0.1%
 
60000002< 0.1%
 
50000001< 0.1%
 
49000001< 0.1%
 
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Verified
139563 
Source Verified
131385 
Not Verified
125082 
ValueCountFrequency (%) 
Verified13956335.2%
 
Source Verified13138533.2%
 
Not Verified12508231.6%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length15
Median length12
Mean length11.58564503
Min length8

Overview of Unicode Properties

Unique unicode characters13
Unique unicode categories3 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e92344520.1%
 
i79206017.3%
 
r52741511.5%
 
V3960308.6%
 
f3960308.6%
 
d3960308.6%
 
o2564675.6%
 
2564675.6%
 
S1313852.9%
 
u1313852.9%
 
c1313852.9%
 
N1250822.7%
 
t1250822.7%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter367929980.2%
 
Uppercase Letter65249714.2%
 
Space Separator2564675.6%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
V39603060.7%
 
S13138520.1%
 
N12508219.2%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e92344525.1%
 
i79206021.5%
 
r52741514.3%
 
f39603010.8%
 
d39603010.8%
 
o2564677.0%
 
u1313853.6%
 
c1313853.6%
 
t1250823.4%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
256467100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin433179694.4%
 
Common2564675.6%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e92344521.3%
 
i79206018.3%
 
r52741512.2%
 
V3960309.1%
 
f3960309.1%
 
d3960309.1%
 
o2564675.9%
 
S1313853.0%
 
u1313853.0%
 
c1313853.0%
 
N1250822.9%
 
t1250822.9%
 

Most frequent Common characters

ValueCountFrequency (%) 
256467100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII4588263100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e92344520.1%
 
i79206017.3%
 
r52741511.5%
 
V3960308.6%
 
f3960308.6%
 
d3960308.6%
 
o2564675.6%
 
2564675.6%
 
S1313852.9%
 
u1313852.9%
 
c1313852.9%
 
N1250822.7%
 
t1250822.7%
 

issue_d
Categorical

HIGH CARDINALITY

Distinct115
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Oct-2014
 
14846
Jul-2014
 
12609
Jan-2015
 
11705
Dec-2013
 
10618
Nov-2013
 
10496
Other values (110)
335756 
ValueCountFrequency (%) 
Oct-2014148463.7%
 
Jul-2014126093.2%
 
Jan-2015117053.0%
 
Dec-2013106182.7%
 
Nov-2013104962.7%
 
Jul-2015102702.6%
 
Oct-2013100472.5%
 
Jan-201497052.5%
 
Apr-201594702.4%
 
Sep-201391792.3%
 
Aug-201391122.3%
 
Apr-201490202.3%
 
Nov-201488582.2%
 
May-201488402.2%
 
Jul-201386312.2%
 
Oct-201584012.1%
 
May-201583252.1%
 
Mar-201481082.0%
 
Jun-201379472.0%
 
Aug-201478602.0%
 
Feb-201476241.9%
 
Jun-201476101.9%
 
May-201375671.9%
 
Mar-201572681.8%
 
Feb-201571671.8%
 
Other values (90)16474741.6%
 
Frequencies of value counts

Unique

Unique1 ?
Unique (%)< 0.1%
Histogram of lengths of the category

Length

Max length8
Median length8
Mean length8
Min length8

Overview of Unicode Properties

Unique unicode characters33
Unique unicode categories4 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
243723213.8%
 
041054913.0%
 
140820412.9%
 
-39603012.5%
 
J1045363.3%
 
41028603.2%
 
u1026703.2%
 
a984963.1%
 
3976623.1%
 
5942643.0%
 
e854432.7%
 
c712122.2%
 
A660392.1%
 
r651422.1%
 
n648222.0%
 
M638142.0%
 
p608421.9%
 
O421301.3%
 
t421301.3%
 
l397141.3%
 
N340681.1%
 
o340681.1%
 
v340681.1%
 
g328161.0%
 
y318951.0%
 
Other values (8)1475344.7%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number158412050.0%
 
Lowercase Letter79206025.0%
 
Uppercase Letter39603012.5%
 
Dash Punctuation39603012.5%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
J10453626.4%
 
A6603916.7%
 
M6381416.1%
 
O4213010.6%
 
N340688.6%
 
D290827.3%
 
F287427.3%
 
S276197.0%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
u10267013.0%
 
a9849612.4%
 
e8544310.8%
 
c712129.0%
 
r651428.2%
 
n648228.2%
 
p608427.7%
 
t421305.3%
 
l397145.0%
 
o340684.3%
 
v340684.3%
 
g328164.1%
 
y318954.0%
 
b287423.6%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-396030100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
243723227.6%
 
041054925.9%
 
140820425.8%
 
41028606.5%
 
3976626.2%
 
5942646.0%
 
6280881.8%
 
938260.2%
 
812400.1%
 
7195< 0.1%
 

Most occurring scripts

ValueCountFrequency (%) 
Common198015062.5%
 
Latin118809037.5%
 

Most frequent Latin characters

ValueCountFrequency (%) 
J1045368.8%
 
u1026708.6%
 
a984968.3%
 
e854437.2%
 
c712126.0%
 
A660395.6%
 
r651425.5%
 
n648225.5%
 
M638145.4%
 
p608425.1%
 
O421303.5%
 
t421303.5%
 
l397143.3%
 
N340682.9%
 
o340682.9%
 
v340682.9%
 
g328162.8%
 
y318952.7%
 
D290822.4%
 
F287422.4%
 
b287422.4%
 
S276192.3%
 

Most frequent Common characters

ValueCountFrequency (%) 
243723222.1%
 
041054920.7%
 
140820420.6%
 
-39603020.0%
 
41028605.2%
 
3976624.9%
 
5942644.8%
 
6280881.4%
 
938260.2%
 
812400.1%
 
7195< 0.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII3168240100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
243723213.8%
 
041054913.0%
 
140820412.9%
 
-39603012.5%
 
J1045363.3%
 
41028603.2%
 
u1026703.2%
 
a984963.1%
 
3976623.1%
 
5942643.0%
 
e854432.7%
 
c712122.2%
 
A660392.1%
 
r651422.1%
 
n648222.0%
 
M638142.0%
 
p608421.9%
 
O421301.3%
 
t421301.3%
 
l397141.3%
 
N340681.1%
 
o340681.1%
 
v340681.1%
 
g328161.0%
 
y318951.0%
 
Other values (8)1475344.7%
 

loan_status
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Fully Paid
318357 
Charged Off
77673 
ValueCountFrequency (%) 
Fully Paid31835780.4%
 
Charged Off7767319.6%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length11
Median length10
Mean length10.19612908
Min length10

Overview of Unicode Properties

Unique unicode characters16
Unique unicode categories3 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
l63671415.8%
 
3960309.8%
 
a3960309.8%
 
d3960309.8%
 
F3183577.9%
 
u3183577.9%
 
y3183577.9%
 
P3183577.9%
 
i3183577.9%
 
f1553463.8%
 
C776731.9%
 
h776731.9%
 
r776731.9%
 
g776731.9%
 
e776731.9%
 
O776731.9%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter284988370.6%
 
Uppercase Letter79206019.6%
 
Space Separator3960309.8%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
F31835740.2%
 
P31835740.2%
 
C776739.8%
 
O776739.8%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
l63671422.3%
 
a39603013.9%
 
d39603013.9%
 
u31835711.2%
 
y31835711.2%
 
i31835711.2%
 
f1553465.5%
 
h776732.7%
 
r776732.7%
 
g776732.7%
 
e776732.7%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
396030100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin364194390.2%
 
Common3960309.8%
 

Most frequent Latin characters

ValueCountFrequency (%) 
l63671417.5%
 
a39603010.9%
 
d39603010.9%
 
F3183578.7%
 
u3183578.7%
 
y3183578.7%
 
P3183578.7%
 
i3183578.7%
 
f1553464.3%
 
C776732.1%
 
h776732.1%
 
r776732.1%
 
g776732.1%
 
e776732.1%
 
O776732.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
396030100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII4037973100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
l63671415.8%
 
3960309.8%
 
a3960309.8%
 
d3960309.8%
 
F3183577.9%
 
u3183577.9%
 
y3183577.9%
 
P3183577.9%
 
i3183577.9%
 
f1553463.8%
 
C776731.9%
 
h776731.9%
 
r776731.9%
 
g776731.9%
 
e776731.9%
 
O776731.9%
 

purpose
Categorical

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
debt_consolidation
234507 
credit_card
83019 
home_improvement
24030 
other
 
21185
major_purchase
 
8790
Other values (9)
24499 
ValueCountFrequency (%) 
debt_consolidation23450759.2%
 
credit_card8301921.0%
 
home_improvement240306.1%
 
other211855.3%
 
major_purchase87902.2%
 
small_business57011.4%
 
car46971.2%
 
medical41961.1%
 
moving28540.7%
 
vacation24520.6%
 
house22010.6%
 
wedding18120.5%
 
renewable_energy3290.1%
 
educational2570.1%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length18
Median length18
Mean length14.99784612
Min length3

Overview of Unicode Properties

Unique unicode characters22
Unique unicode categories2 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
o78932013.3%
 
d64312910.8%
 
t59995710.1%
 
i59333510.0%
 
n5067788.5%
 
e4354037.3%
 
c4209377.1%
 
_3563766.0%
 
a3554476.0%
 
s2683024.5%
 
l2506914.2%
 
b2405374.0%
 
r2341883.9%
 
m936311.6%
 
h562060.9%
 
p328200.6%
 
v293360.5%
 
u169490.3%
 
j87900.1%
 
g49950.1%
 
w2141< 0.1%
 
y329< 0.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter558322194.0%
 
Connector Punctuation3563766.0%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
o78932014.1%
 
d64312911.5%
 
t59995710.7%
 
i59333510.6%
 
n5067789.1%
 
e4354037.8%
 
c4209377.5%
 
a3554476.4%
 
s2683024.8%
 
l2506914.5%
 
b2405374.3%
 
r2341884.2%
 
m936311.7%
 
h562061.0%
 
p328200.6%
 
v293360.5%
 
u169490.3%
 
j87900.2%
 
g49950.1%
 
w2141< 0.1%
 
y329< 0.1%
 

Most frequent Connector Punctuation characters

ValueCountFrequency (%) 
_356376100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin558322194.0%
 
Common3563766.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
o78932014.1%
 
d64312911.5%
 
t59995710.7%
 
i59333510.6%
 
n5067789.1%
 
e4354037.8%
 
c4209377.5%
 
a3554476.4%
 
s2683024.8%
 
l2506914.5%
 
b2405374.3%
 
r2341884.2%
 
m936311.7%
 
h562061.0%
 
p328200.6%
 
v293360.5%
 
u169490.3%
 
j87900.2%
 
g49950.1%
 
w2141< 0.1%
 
y329< 0.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
_356376100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII5939597100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
o78932013.3%
 
d64312910.8%
 
t59995710.1%
 
i59333510.0%
 
n5067788.5%
 
e4354037.3%
 
c4209377.1%
 
_3563766.0%
 
a3554476.0%
 
s2683024.5%
 
l2506914.2%
 
b2405374.0%
 
r2341883.9%
 
m936311.6%
 
h562060.9%
 
p328200.6%
 
v293360.5%
 
u169490.3%
 
j87900.1%
 
g49950.1%
 
w2141< 0.1%
 
y329< 0.1%
 

title
Categorical

HIGH CARDINALITY

Distinct48817
Distinct (%)12.4%
Missing1755
Missing (%)0.4%
Memory size3.0 MiB
Debt consolidation
152472 
Credit card refinancing
51487 
Home improvement
15264 
Other
 
12930
Debt Consolidation
 
11608
Other values (48812)
150514 
ValueCountFrequency (%) 
Debt consolidation15247238.5%
 
Credit card refinancing5148713.0%
 
Home improvement152643.9%
 
Other129303.3%
 
Debt Consolidation116082.9%
 
Major purchase47691.2%
 
Consolidation38521.0%
 
debt consolidation35470.9%
 
Business29490.7%
 
Debt Consolidation Loan28640.7%
 
Medical expenses27420.7%
 
Car financing21390.5%
 
Credit Card Consolidation17750.4%
 
Vacation17170.4%
 
Moving and relocation16890.4%
 
consolidation15950.4%
 
Personal Loan15910.4%
 
Consolidation Loan12990.3%
 
Home Improvement12680.3%
 
Home buying11830.3%
 
Credit Card Refinance10940.3%
 
Credit Card Payoff10520.3%
 
Consolidate9190.2%
 
Personal8580.2%
 
Loan7510.2%
 
Other values (48792)11086128.0%
 
(Missing)17550.4%
 
Frequencies of value counts

Unique

Unique41798 ?
Unique (%)10.6%
Histogram of lengths of the category

Length

Max length80
Median length18
Mean length17.17798399
Min length2

Overview of Unicode Properties

Unique unicode characters101
Unique unicode categories15 ?
Unique unicode scripts2 ?
Unique unicode blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
o73579110.8%
 
n68636110.1%
 
i6556949.6%
 
t5452688.0%
 
e5210047.7%
 
4945617.3%
 
a4475026.6%
 
d3861015.7%
 
c3228284.7%
 
r2956304.3%
 
s2629213.9%
 
l2490623.7%
 
b2016813.0%
 
D1879302.8%
 
C1314221.9%
 
f979571.4%
 
m810371.2%
 
g754361.1%
 
p488530.7%
 
h336250.5%
 
y299170.4%
 
u270750.4%
 
v268860.4%
 
L266220.4%
 
H246030.4%
 
Other values (76)2072303.0%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter575466484.6%
 
Uppercase Letter5275897.8%
 
Space Separator4945617.3%
 
Decimal Number137230.2%
 
Other Punctuation91470.1%
 
Dash Punctuation1929< 0.1%
 
Connector Punctuation663< 0.1%
 
Close Punctuation209< 0.1%
 
Currency Symbol178< 0.1%
 
Open Punctuation163< 0.1%
 
Math Symbol151< 0.1%
 
Control15< 0.1%
 
Modifier Symbol3< 0.1%
 
Other Symbol1< 0.1%
 
Other Number1< 0.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
D18793035.6%
 
C13142224.9%
 
L266225.0%
 
H246034.7%
 
O232484.4%
 
P181673.4%
 
M176733.3%
 
R135272.6%
 
B112362.1%
 
I98671.9%
 
E87121.7%
 
F85561.6%
 
S81161.5%
 
A80341.5%
 
T80331.5%
 
N73811.4%
 
G37010.7%
 
W29670.6%
 
V29570.6%
 
Y15430.3%
 
U13940.3%
 
K8270.2%
 
J6470.1%
 
X182< 0.1%
 
Q128< 0.1%
 
Other values (2)116< 0.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
o73579112.8%
 
n68636111.9%
 
i65569411.4%
 
t5452689.5%
 
e5210049.1%
 
a4475027.8%
 
d3861016.7%
 
c3228285.6%
 
r2956305.1%
 
s2629214.6%
 
l2490624.3%
 
b2016813.5%
 
f979571.7%
 
m810371.4%
 
g754361.3%
 
p488530.8%
 
h336250.6%
 
y299170.5%
 
u270750.5%
 
v268860.5%
 
w71250.1%
 
j59250.1%
 
x57310.1%
 
k44820.1%
 
q427< 0.1%
 
Other values (2)345< 0.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
494561100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
!247427.0%
 
/173819.0%
 
.164718.0%
 
'104011.4%
 
,8489.3%
 
&7788.5%
 
%1431.6%
 
#1321.4%
 
:1251.4%
 
"1081.2%
 
?380.4%
 
;350.4%
 
*250.3%
 
@100.1%
 
\60.1%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
1402829.4%
 
2344625.1%
 
0320323.3%
 
313069.5%
 
43872.8%
 
53642.7%
 
93172.3%
 
62912.1%
 
71931.4%
 
81881.4%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-1929100.0%
 

Most frequent Control characters

ValueCountFrequency (%) 
1173.3%
 
€213.3%
 
™16.7%
 
…16.7%
 

Most frequent Connector Punctuation characters

ValueCountFrequency (%) 
_663100.0%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(15896.9%
 
[53.1%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
)20598.1%
 
]41.9%
 

Most frequent Currency Symbol characters

ValueCountFrequency (%) 
$178100.0%
 

Most frequent Math Symbol characters

ValueCountFrequency (%) 
+10368.2%
 
=1711.3%
 
~96.0%
 
<96.0%
 
>85.3%
 
|53.3%
 

Most frequent Modifier Symbol characters

ValueCountFrequency (%) 
`266.7%
 
^133.3%
 

Most frequent Other Symbol characters

ValueCountFrequency (%) 
¦1100.0%
 

Most frequent Other Number characters

ValueCountFrequency (%) 
³1100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin628225392.3%
 
Common5207447.7%
 

Most frequent Latin characters

ValueCountFrequency (%) 
o73579111.7%
 
n68636110.9%
 
i65569410.4%
 
t5452688.7%
 
e5210048.3%
 
a4475027.1%
 
d3861016.1%
 
c3228285.1%
 
r2956304.7%
 
s2629214.2%
 
l2490624.0%
 
b2016813.2%
 
D1879303.0%
 
C1314222.1%
 
f979571.6%
 
m810371.3%
 
g754361.2%
 
p488530.8%
 
h336250.5%
 
y299170.5%
 
u270750.4%
 
v268860.4%
 
L266220.4%
 
H246030.4%
 
O232480.4%
 
Other values (29)1577992.5%
 

Most frequent Common characters

ValueCountFrequency (%) 
49456195.0%
 
140280.8%
 
234460.7%
 
032030.6%
 
!24740.5%
 
-19290.4%
 
/17380.3%
 
.16470.3%
 
313060.3%
 
'10400.2%
 
,8480.2%
 
&7780.1%
 
_6630.1%
 
43870.1%
 
53640.1%
 
93170.1%
 
62910.1%
 
)205< 0.1%
 
7193< 0.1%
 
8188< 0.1%
 
$178< 0.1%
 
(158< 0.1%
 
%143< 0.1%
 
#132< 0.1%
 
:125< 0.1%
 
Other values (22)4020.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII6802988> 99.9%
 
None9< 0.1%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
o73579110.8%
 
n68636110.1%
 
i6556949.6%
 
t5452688.0%
 
e5210047.7%
 
4945617.3%
 
a4475026.6%
 
d3861015.7%
 
c3228284.7%
 
r2956304.3%
 
s2629213.9%
 
l2490623.7%
 
b2016813.0%
 
D1879302.8%
 
C1314221.9%
 
f979571.4%
 
m810371.2%
 
g754361.1%
 
p488530.7%
 
h336250.5%
 
y299170.4%
 
u270750.4%
 
v268860.4%
 
L266220.4%
 
H246030.4%
 
Other values (69)2072213.0%
 

Most frequent None characters

ValueCountFrequency (%) 
â222.2%
 
€222.2%
 
™111.1%
 
…111.1%
 
¦111.1%
 
Ã111.1%
 
³111.1%
 

dti
Real number (ℝ≥0)

SKEWED

Distinct4262
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.37951365
Minimum0
Maximum9999
Zeros313
Zeros (%)0.1%
Memory size3.0 MiB

Quantile statistics

Minimum0
5-th percentile4.68
Q111.28
median16.91
Q322.98
95-th percentile31.58
Maximum9999
Range9999
Interquartile range (IQR)11.7

Descriptive statistics

Standard deviation18.01909234
Coefficient of variation (CV)1.036800725
Kurtosis237923.6765
Mean17.37951365
Median Absolute Deviation (MAD)5.83
Skewness431.0512254
Sum6882808.79
Variance324.6876889
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
03130.1%
 
14.43100.1%
 
19.23020.1%
 
16.83010.1%
 
183000.1%
 
20.42960.1%
 
122930.1%
 
13.22910.1%
 
21.62700.1%
 
15.62660.1%
 
11.522540.1%
 
10.82470.1%
 
22.82450.1%
 
12.482450.1%
 
9.62430.1%
 
17.762380.1%
 
12.722370.1%
 
13.682330.1%
 
15.842330.1%
 
16.22330.1%
 
16.322300.1%
 
13.922250.1%
 
18.482240.1%
 
20.882240.1%
 
19.922240.1%
 
Other values (4237)38955398.4%
 
ValueCountFrequency (%) 
03130.1%
 
0.018< 0.1%
 
0.0212< 0.1%
 
0.035< 0.1%
 
0.045< 0.1%
 
0.056< 0.1%
 
0.067< 0.1%
 
0.077< 0.1%
 
0.088< 0.1%
 
0.094< 0.1%
 
ValueCountFrequency (%) 
99991< 0.1%
 
16221< 0.1%
 
380.531< 0.1%
 
189.91< 0.1%
 
145.651< 0.1%
 
138.031< 0.1%
 
120.661< 0.1%
 
107.551< 0.1%
 
93.861< 0.1%
 
92.131< 0.1%
 

earliest_cr_line
Categorical

HIGH CARDINALITY

Distinct684
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
Oct-2000
 
3017
Aug-2000
 
2935
Oct-2001
 
2896
Aug-2001
 
2884
Nov-2000
 
2736
Other values (679)
381562 
ValueCountFrequency (%) 
Oct-200030170.8%
 
Aug-200029350.7%
 
Oct-200128960.7%
 
Aug-200128840.7%
 
Nov-200027360.7%
 
Oct-199927260.7%
 
Nov-199927000.7%
 
Sep-200026910.7%
 
Oct-200226400.7%
 
Aug-200225990.7%
 
Sep-200125650.6%
 
Aug-199925480.6%
 
Sep-200225300.6%
 
Sep-199925300.6%
 
Dec-200025080.6%
 
Sep-200324910.6%
 
Dec-199924790.6%
 
Oct-200324390.6%
 
Nov-200124320.6%
 
Dec-200124230.6%
 
Jul-200124160.6%
 
Jul-200023690.6%
 
May-200123340.6%
 
Jan-200123340.6%
 
Dec-199823290.6%
 
Other values (659)33147983.7%
 
Frequencies of value counts

Unique

Unique45 ?
Unique (%)< 0.1%
Histogram of lengths of the category

Length

Max length8
Median length8
Mean length8
Min length8

Overview of Unicode Properties

Unique unicode characters33
Unique unicode categories4 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
041655713.1%
 
940238412.7%
 
-39603012.5%
 
12536128.0%
 
22282607.2%
 
e1004033.2%
 
u997663.1%
 
J931112.9%
 
a927562.9%
 
8782972.5%
 
c719782.3%
 
p669042.1%
 
A665802.1%
 
M620622.0%
 
n611391.9%
 
r608481.9%
 
7449221.4%
 
4408091.3%
 
6403211.3%
 
3395681.2%
 
5393901.2%
 
O382911.2%
 
t382911.2%
 
S376731.2%
 
g373491.2%
 
Other values (8)2609398.2%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number158412050.0%
 
Lowercase Letter79206025.0%
 
Uppercase Letter39603012.5%
 
Dash Punctuation39603012.5%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
J9311123.5%
 
A6658016.8%
 
M6206215.7%
 
O382919.7%
 
S376739.5%
 
N355839.0%
 
D336878.5%
 
F290437.3%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e10040312.7%
 
u9976612.6%
 
a9275611.7%
 
c719789.1%
 
p669048.4%
 
n611397.7%
 
r608487.7%
 
t382914.8%
 
g373494.7%
 
o355834.5%
 
v355834.5%
 
l319724.0%
 
y304453.8%
 
b290433.7%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-396030100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
041655726.3%
 
940238425.4%
 
125361216.0%
 
222826014.4%
 
8782974.9%
 
7449222.8%
 
4408092.6%
 
6403212.5%
 
3395682.5%
 
5393902.5%
 

Most occurring scripts

ValueCountFrequency (%) 
Common198015062.5%
 
Latin118809037.5%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e1004038.5%
 
u997668.4%
 
J931117.8%
 
a927567.8%
 
c719786.1%
 
p669045.6%
 
A665805.6%
 
M620625.2%
 
n611395.1%
 
r608485.1%
 
O382913.2%
 
t382913.2%
 
S376733.2%
 
g373493.1%
 
N355833.0%
 
o355833.0%
 
v355833.0%
 
D336872.8%
 
l319722.7%
 
y304452.6%
 
F290432.4%
 
b290432.4%
 

Most frequent Common characters

ValueCountFrequency (%) 
041655721.0%
 
940238420.3%
 
-39603020.0%
 
125361212.8%
 
222826011.5%
 
8782974.0%
 
7449222.3%
 
4408092.1%
 
6403212.0%
 
3395682.0%
 
5393902.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII3168240100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
041655713.1%
 
940238412.7%
 
-39603012.5%
 
12536128.0%
 
22282607.2%
 
e1004033.2%
 
u997663.1%
 
J931112.9%
 
a927562.9%
 
8782972.5%
 
c719782.3%
 
p669042.1%
 
A665802.1%
 
M620622.0%
 
n611391.9%
 
r608481.9%
 
7449221.4%
 
4408091.3%
 
6403211.3%
 
3395681.2%
 
5393901.2%
 
O382911.2%
 
t382911.2%
 
S376731.2%
 
g373491.2%
 
Other values (8)2609398.2%
 

open_acc
Real number (ℝ≥0)

Distinct61
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.3111532
Minimum0
Maximum90
Zeros6
Zeros (%)< 0.1%
Memory size3.0 MiB

Quantile statistics

Minimum0
5-th percentile5
Q18
median10
Q314
95-th percentile21
Maximum90
Range90
Interquartile range (IQR)6

Descriptive statistics

Standard deviation5.137648808
Coefficient of variation (CV)0.4542108766
Kurtosis2.966944774
Mean11.3111532
Median Absolute Deviation (MAD)3
Skewness1.213018844
Sum4479556
Variance26.39543527
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
9367799.3%
 
10354418.9%
 
8351378.9%
 
11326958.3%
 
7313287.9%
 
12291577.4%
 
6259276.5%
 
13249836.3%
 
14211735.3%
 
5183084.6%
 
15173474.4%
 
16143763.6%
 
17116182.9%
 
4107092.7%
 
1894302.4%
 
1977232.0%
 
2059731.5%
 
347831.2%
 
2146501.2%
 
2236920.9%
 
2329440.7%
 
2423640.6%
 
2517910.5%
 
214590.4%
 
2612730.3%
 
Other values (36)49701.3%
 
ValueCountFrequency (%) 
06< 0.1%
 
185< 0.1%
 
214590.4%
 
347831.2%
 
4107092.7%
 
5183084.6%
 
6259276.5%
 
7313287.9%
 
8351378.9%
 
9367799.3%
 
ValueCountFrequency (%) 
901< 0.1%
 
762< 0.1%
 
581< 0.1%
 
571< 0.1%
 
562< 0.1%
 
552< 0.1%
 
543< 0.1%
 
536< 0.1%
 
523< 0.1%
 
514< 0.1%
 

pub_rec
Real number (ℝ≥0)

ZEROS

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1781910461
Minimum0
Maximum86
Zeros338272
Zeros (%)85.4%
Memory size3.0 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum86
Range86
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.5306706005
Coefficient of variation (CV)2.978099136
Kurtosis1867.466643
Mean0.1781910461
Median Absolute Deviation (MAD)0
Skewness16.5765642
Sum70569
Variance0.2816112862
MonotocityNot monotonic
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%) 
033827285.4%
 
14973912.6%
 
254761.4%
 
315210.4%
 
45270.1%
 
52370.1%
 
6122< 0.1%
 
756< 0.1%
 
834< 0.1%
 
912< 0.1%
 
1011< 0.1%
 
118< 0.1%
 
134< 0.1%
 
124< 0.1%
 
192< 0.1%
 
861< 0.1%
 
401< 0.1%
 
171< 0.1%
 
151< 0.1%
 
241< 0.1%
 
ValueCountFrequency (%) 
033827285.4%
 
14973912.6%
 
254761.4%
 
315210.4%
 
45270.1%
 
52370.1%
 
6122< 0.1%
 
756< 0.1%
 
834< 0.1%
 
912< 0.1%
 
ValueCountFrequency (%) 
861< 0.1%
 
401< 0.1%
 
241< 0.1%
 
192< 0.1%
 
171< 0.1%
 
151< 0.1%
 
134< 0.1%
 
124< 0.1%
 
118< 0.1%
 
1011< 0.1%
 

revol_bal
Real number (ℝ≥0)

Distinct55622
Distinct (%)14.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15844.53985
Minimum0
Maximum1743266
Zeros2128
Zeros (%)0.5%
Memory size3.0 MiB

Quantile statistics

Minimum0
5-th percentile1685
Q16025
median11181
Q319620
95-th percentile41066.55
Maximum1743266
Range1743266
Interquartile range (IQR)13595

Descriptive statistics

Standard deviation20591.83611
Coefficient of variation (CV)1.299617174
Kurtosis384.2210931
Mean15844.53985
Median Absolute Deviation (MAD)6112
Skewness11.72751512
Sum6274913118
Variance424023714.3
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
021280.5%
 
565541< 0.1%
 
609538< 0.1%
 
779238< 0.1%
 
395337< 0.1%
 
607736< 0.1%
 
509836< 0.1%
 
652135< 0.1%
 
454135< 0.1%
 
523535< 0.1%
 
1036235< 0.1%
 
578935< 0.1%
 
524935< 0.1%
 
538935< 0.1%
 
850235< 0.1%
 
644434< 0.1%
 
717934< 0.1%
 
950834< 0.1%
 
399734< 0.1%
 
480834< 0.1%
 
515234< 0.1%
 
546334< 0.1%
 
761834< 0.1%
 
551434< 0.1%
 
567134< 0.1%
 
Other values (55597)39305699.2%
 
ValueCountFrequency (%) 
021280.5%
 
130< 0.1%
 
226< 0.1%
 
328< 0.1%
 
420< 0.1%
 
523< 0.1%
 
630< 0.1%
 
721< 0.1%
 
821< 0.1%
 
923< 0.1%
 
ValueCountFrequency (%) 
17432661< 0.1%
 
12987831< 0.1%
 
11900461< 0.1%
 
10308261< 0.1%
 
10239401< 0.1%
 
9758001< 0.1%
 
8675281< 0.1%
 
8386981< 0.1%
 
8143001< 0.1%
 
7786141< 0.1%
 

revol_util
Real number (ℝ≥0)

Distinct1226
Distinct (%)0.3%
Missing276
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean53.79174864
Minimum0
Maximum892.3
Zeros2213
Zeros (%)0.6%
Memory size3.0 MiB

Quantile statistics

Minimum0
5-th percentile11.2
Q135.8
median54.8
Q372.9
95-th percentile92
Maximum892.3
Range892.3
Interquartile range (IQR)37.1

Descriptive statistics

Standard deviation24.45219306
Coefficient of variation (CV)0.4545714479
Kurtosis2.71227821
Mean53.79174864
Median Absolute Deviation (MAD)18.5
Skewness-0.07177802033
Sum21288299.69
Variance597.9097456
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
022130.6%
 
537520.2%
 
607390.2%
 
617340.2%
 
557300.2%
 
547250.2%
 
627210.2%
 
477200.2%
 
577190.2%
 
587170.2%
 
597080.2%
 
657060.2%
 
637010.2%
 
466980.2%
 
566890.2%
 
516810.2%
 
496790.2%
 
486710.2%
 
526640.2%
 
506610.2%
 
646570.2%
 
696540.2%
 
446520.2%
 
676440.2%
 
416380.2%
 
Other values (1201)37688195.2%
 
ValueCountFrequency (%) 
022130.6%
 
0.011< 0.1%
 
0.041< 0.1%
 
0.051< 0.1%
 
0.12530.1%
 
0.161< 0.1%
 
0.22110.1%
 
0.3187< 0.1%
 
0.4189< 0.1%
 
0.461< 0.1%
 
ValueCountFrequency (%) 
892.31< 0.1%
 
1531< 0.1%
 
152.51< 0.1%
 
150.71< 0.1%
 
1481< 0.1%
 
146.11< 0.1%
 
145.81< 0.1%
 
140.41< 0.1%
 
136.71< 0.1%
 
132.11< 0.1%
 

total_acc
Real number (ℝ≥0)

Distinct118
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25.41474383
Minimum2
Maximum151
Zeros0
Zeros (%)0.0%
Memory size3.0 MiB

Quantile statistics

Minimum2
5-th percentile9
Q117
median24
Q332
95-th percentile47
Maximum151
Range149
Interquartile range (IQR)15

Descriptive statistics

Standard deviation11.88699072
Coefficient of variation (CV)0.4677202651
Kurtosis1.204620014
Mean25.41474383
Median Absolute Deviation (MAD)8
Skewness0.8643276369
Sum10065001
Variance141.3005485
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
21142803.6%
 
22142603.6%
 
20142283.6%
 
23139233.5%
 
24138783.5%
 
19138763.5%
 
18137103.5%
 
17134953.4%
 
25132253.3%
 
26127993.2%
 
16127713.2%
 
27123433.1%
 
15122833.1%
 
28117063.0%
 
14115242.9%
 
29112742.8%
 
13109362.8%
 
30105872.7%
 
3198692.5%
 
1298582.5%
 
3295522.4%
 
1188442.2%
 
3386822.2%
 
3480882.0%
 
1076721.9%
 
Other values (93)10236725.8%
 
ValueCountFrequency (%) 
218< 0.1%
 
33270.1%
 
412380.3%
 
520280.5%
 
629230.7%
 
741431.0%
 
853651.4%
 
963621.6%
 
1076721.9%
 
1188442.2%
 
ValueCountFrequency (%) 
1511< 0.1%
 
1501< 0.1%
 
1351< 0.1%
 
1291< 0.1%
 
1241< 0.1%
 
1181< 0.1%
 
1171< 0.1%
 
1162< 0.1%
 
1151< 0.1%
 
1112< 0.1%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
f
238066 
w
157964 
ValueCountFrequency (%) 
f23806660.1%
 
w15796439.9%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters2
Unique unicode categories1 ?
Unique unicode scripts1 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
f23806660.1%
 
w15796439.9%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter396030100.0%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
f23806660.1%
 
w15796439.9%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin396030100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
f23806660.1%
 
w15796439.9%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII396030100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
f23806660.1%
 
w15796439.9%
 

application_type
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
INDIVIDUAL
395319 
JOINT
 
425
DIRECT_PAY
 
286
ValueCountFrequency (%) 
INDIVIDUAL39531999.8%
 
JOINT4250.1%
 
DIRECT_PAY2860.1%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length10
Median length10
Mean length9.994634245
Min length5

Overview of Unicode Properties

Unique unicode characters16
Unique unicode categories2 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
I118666830.0%
 
D79092420.0%
 
N39574410.0%
 
A39560510.0%
 
V39531910.0%
 
U39531910.0%
 
L39531910.0%
 
T711< 0.1%
 
J425< 0.1%
 
O425< 0.1%
 
R286< 0.1%
 
E286< 0.1%
 
C286< 0.1%
 
_286< 0.1%
 
P286< 0.1%
 
Y286< 0.1%
 

Most occurring categories

ValueCountFrequency (%) 
Uppercase Letter3957889> 99.9%
 
Connector Punctuation286< 0.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
I118666830.0%
 
D79092420.0%
 
N39574410.0%
 
A39560510.0%
 
V39531910.0%
 
U39531910.0%
 
L39531910.0%
 
T711< 0.1%
 
J425< 0.1%
 
O425< 0.1%
 
R286< 0.1%
 
E286< 0.1%
 
C286< 0.1%
 
P286< 0.1%
 
Y286< 0.1%
 

Most frequent Connector Punctuation characters

ValueCountFrequency (%) 
_286100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin3957889> 99.9%
 
Common286< 0.1%
 

Most frequent Latin characters

ValueCountFrequency (%) 
I118666830.0%
 
D79092420.0%
 
N39574410.0%
 
A39560510.0%
 
V39531910.0%
 
U39531910.0%
 
L39531910.0%
 
T711< 0.1%
 
J425< 0.1%
 
O425< 0.1%
 
R286< 0.1%
 
E286< 0.1%
 
C286< 0.1%
 
P286< 0.1%
 
Y286< 0.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
_286100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII3958175100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
I118666830.0%
 
D79092420.0%
 
N39574410.0%
 
A39560510.0%
 
V39531910.0%
 
U39531910.0%
 
L39531910.0%
 
T711< 0.1%
 
J425< 0.1%
 
O425< 0.1%
 
R286< 0.1%
 
E286< 0.1%
 
C286< 0.1%
 
_286< 0.1%
 
P286< 0.1%
 
Y286< 0.1%
 

mort_acc
Real number (ℝ≥0)

MISSING
ZEROS

Distinct33
Distinct (%)< 0.1%
Missing37795
Missing (%)9.5%
Infinite0
Infinite (%)0.0%
Mean1.813990816
Minimum0
Maximum34
Zeros139777
Zeros (%)35.3%
Memory size3.0 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q33
95-th percentile6
Maximum34
Range34
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.147930467
Coefficient of variation (CV)1.184091148
Kurtosis4.477175726
Mean1.813990816
Median Absolute Deviation (MAD)1
Skewness1.600132438
Sum649835
Variance4.613605292
MonotocityNot monotonic
Histogram with fixed size bins (bins=33)
ValueCountFrequency (%) 
013977735.3%
 
16041615.3%
 
24994812.6%
 
3380499.6%
 
4278877.0%
 
5181944.6%
 
6110692.8%
 
760521.5%
 
831210.8%
 
916560.4%
 
108650.2%
 
114790.1%
 
122640.1%
 
13146< 0.1%
 
14107< 0.1%
 
1561< 0.1%
 
1637< 0.1%
 
1722< 0.1%
 
1818< 0.1%
 
1915< 0.1%
 
2013< 0.1%
 
2410< 0.1%
 
227< 0.1%
 
214< 0.1%
 
254< 0.1%
 
Other values (8)14< 0.1%
 
(Missing)377959.5%
 
ValueCountFrequency (%) 
013977735.3%
 
16041615.3%
 
24994812.6%
 
3380499.6%
 
4278877.0%
 
5181944.6%
 
6110692.8%
 
760521.5%
 
831210.8%
 
916560.4%
 
ValueCountFrequency (%) 
341< 0.1%
 
322< 0.1%
 
312< 0.1%
 
301< 0.1%
 
281< 0.1%
 
273< 0.1%
 
262< 0.1%
 
254< 0.1%
 
2410< 0.1%
 
232< 0.1%
 

pub_rec_bankruptcies
Real number (ℝ≥0)

ZEROS

Distinct9
Distinct (%)< 0.1%
Missing535
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean0.1216475556
Minimum0
Maximum8
Zeros350380
Zeros (%)88.5%
Memory size3.0 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.3561742766
Coefficient of variation (CV)2.927919718
Kurtosis18.10416044
Mean0.1216475556
Median Absolute Deviation (MAD)0
Skewness3.423440368
Sum48111
Variance0.1268601153
MonotocityNot monotonic
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%) 
035038088.5%
 
14279010.8%
 
218470.5%
 
33510.1%
 
482< 0.1%
 
532< 0.1%
 
67< 0.1%
 
74< 0.1%
 
82< 0.1%
 
(Missing)5350.1%
 
ValueCountFrequency (%) 
035038088.5%
 
14279010.8%
 
218470.5%
 
33510.1%
 
482< 0.1%
 
532< 0.1%
 
67< 0.1%
 
74< 0.1%
 
82< 0.1%
 
ValueCountFrequency (%) 
82< 0.1%
 
74< 0.1%
 
67< 0.1%
 
532< 0.1%
 
482< 0.1%
 
33510.1%
 
218470.5%
 
14279010.8%
 
035038088.5%
 

address
Categorical

HIGH CARDINALITY
UNIFORM

Distinct393700
Distinct (%)99.4%
Missing0
Missing (%)0.0%
Memory size3.0 MiB
USS Smith FPO AP 70466
 
8
USNS Johnson FPO AE 05113
 
8
USCGC Smith FPO AE 70466
 
8
USS Johnson FPO AE 48052
 
8
USNS Johnson FPO AP 48052
 
7
Other values (393695)
395991 
ValueCountFrequency (%) 
USS Smith FPO AP 704668< 0.1%
 
USNS Johnson FPO AE 051138< 0.1%
 
USCGC Smith FPO AE 704668< 0.1%
 
USS Johnson FPO AE 480528< 0.1%
 
USNS Johnson FPO AP 480527< 0.1%
 
USCGC Miller FPO AA 226906< 0.1%
 
USCGC Jones FPO AE 226906< 0.1%
 
USNS Johnson FPO AA 704666< 0.1%
 
USNV Smith FPO AE 307236< 0.1%
 
USNV Smith FPO AA 008136< 0.1%
 
USCGC Smith FPO AA 704666< 0.1%
 
USS Smith FPO AP 226906< 0.1%
 
USNV Brown FPO AA 480526< 0.1%
 
USCGC Smith FPO AE 480525< 0.1%
 
USCGC Jones FPO AE 307235< 0.1%
 
USS Smith FPO AA 704665< 0.1%
 
USCGC Williams FPO AE 008135< 0.1%
 
USNS Smith FPO AE 480525< 0.1%
 
USNS Smith FPO AE 008135< 0.1%
 
USCGC Smith FPO AE 051135< 0.1%
 
USNS Williams FPO AA 480525< 0.1%
 
USNV Brown FPO AP 008135< 0.1%
 
USCGC Brown FPO AA 307235< 0.1%
 
USCGC Smith FPO AA 295975< 0.1%
 
USS Williams FPO AE 008135< 0.1%
 
Other values (393675)395883> 99.9%
 
Frequencies of value counts

Unique

Unique391984 ?
Unique (%)99.0%
Histogram of lengths of the category

Length

Max length69
Median length45
Mean length44.71395096
Min length20

Overview of Unicode Properties

Unique unicode characters67
Unique unicode categories6 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
212862612.0%
 
e9115455.1%
 
a7354274.2%
 
t7027874.0%
 
r6567483.7%
 
06248253.5%
 
i5800433.3%
 
o5794803.3%
 
n5513503.1%
 
24875252.8%
 
s4716082.7%
 
34439922.5%
 
64212622.4%
 
l4002732.3%
 
3960302.2%
 
3960302.2%
 
73875222.2%
 
13759622.1%
 
93754522.1%
 
53753012.1%
 
,3677062.1%
 
h3418281.9%
 
43302791.9%
 
83298001.9%
 
u3142991.8%
 
Other values (42)402236622.7%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter769071543.4%
 
Decimal Number415192023.4%
 
Uppercase Letter248863914.1%
 
Space Separator212862612.0%
 
Control7920604.5%
 
Other Punctuation4561062.6%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
062482515.0%
 
248752511.7%
 
344399210.7%
 
642126210.1%
 
73875229.3%
 
13759629.1%
 
93754529.0%
 
53753019.0%
 
43302798.0%
 
83298007.9%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
2128626100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
A29595011.9%
 
S27428911.0%
 
P1646446.6%
 
M1617676.5%
 
C1578936.3%
 
N1486226.0%
 
D1067854.3%
 
L1053744.2%
 
W942643.8%
 
R934083.8%
 
T923403.7%
 
J906263.6%
 
E871533.5%
 
O864413.5%
 
B835033.4%
 
I692622.8%
 
K691152.8%
 
H626042.5%
 
V589102.4%
 
F575152.3%
 
G486912.0%
 
U400111.6%
 
Y233100.9%
 
Z88790.4%
 
X71030.3%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e91154511.9%
 
a7354279.6%
 
t7027879.1%
 
r6567488.5%
 
i5800437.5%
 
o5794807.5%
 
n5513507.2%
 
s4716086.1%
 
l4002735.2%
 
h3418284.4%
 
u3142994.1%
 
d1865122.4%
 
y1651132.1%
 
p1602142.1%
 
c1393311.8%
 
m1312451.7%
 
g1168551.5%
 
w1110061.4%
 
b998881.3%
 
v983761.3%
 
k910921.2%
 
f608870.8%
 
x393800.5%
 
z339160.4%
 
q89420.1%
 

Most frequent Control characters

ValueCountFrequency (%) 
39603050.0%
 
39603050.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
,36770680.6%
 
.8840019.4%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin1017935457.5%
 
Common752871242.5%
 

Most frequent Common characters

ValueCountFrequency (%) 
212862628.3%
 
06248258.3%
 
24875256.5%
 
34439925.9%
 
64212625.6%
 
3960305.3%
 
3960305.3%
 
73875225.1%
 
13759625.0%
 
93754525.0%
 
53753015.0%
 
,3677064.9%
 
43302794.4%
 
83298004.4%
 
.884001.2%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e9115459.0%
 
a7354277.2%
 
t7027876.9%
 
r6567486.5%
 
i5800435.7%
 
o5794805.7%
 
n5513505.4%
 
s4716084.6%
 
l4002733.9%
 
h3418283.4%
 
u3142993.1%
 
A2959502.9%
 
S2742892.7%
 
d1865121.8%
 
y1651131.6%
 
P1646441.6%
 
M1617671.6%
 
p1602141.6%
 
C1578931.6%
 
N1486221.5%
 
c1393311.4%
 
m1312451.3%
 
g1168551.1%
 
w1110061.1%
 
D1067851.0%
 
Other values (27)161374015.9%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII17708066100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
212862612.0%
 
e9115455.1%
 
a7354274.2%
 
t7027874.0%
 
r6567483.7%
 
06248253.5%
 
i5800433.3%
 
o5794803.3%
 
n5513503.1%
 
24875252.8%
 
s4716082.7%
 
34439922.5%
 
64212622.4%
 
l4002732.3%
 
3960302.2%
 
3960302.2%
 
73875222.2%
 
13759622.1%
 
93754522.1%
 
53753012.1%
 
,3677062.1%
 
h3418281.9%
 
43302791.9%
 
83298001.9%
 
u3142991.8%
 
Other values (42)402236622.7%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

loan_amnttermint_rateinstallmentgradesub_gradeemp_titleemp_lengthhome_ownershipannual_incverification_statusissue_dloan_statuspurposetitledtiearliest_cr_lineopen_accpub_recrevol_balrevol_utiltotal_accinitial_list_statusapplication_typemort_accpub_rec_bankruptciesaddress
010000.036 months11.44329.48BB4Marketing10+ yearsRENT117000.0Not VerifiedJan-2015Fully PaidvacationVacation26.24Jun-199016.00.036369.041.825.0wINDIVIDUAL0.00.00174 Michelle Gateway\r\nMendozaberg, OK 22690
18000.036 months11.99265.68BB5Credit analyst4 yearsMORTGAGE65000.0Not VerifiedJan-2015Fully Paiddebt_consolidationDebt consolidation22.05Jul-200417.00.020131.053.327.0fINDIVIDUAL3.00.01076 Carney Fort Apt. 347\r\nLoganmouth, SD 05113
215600.036 months10.49506.97BB3Statistician< 1 yearRENT43057.0Source VerifiedJan-2015Fully Paidcredit_cardCredit card refinancing12.79Aug-200713.00.011987.092.226.0fINDIVIDUAL0.00.087025 Mark Dale Apt. 269\r\nNew Sabrina, WV 05113
37200.036 months6.49220.65AA2Client Advocate6 yearsRENT54000.0Not VerifiedNov-2014Fully Paidcredit_cardCredit card refinancing2.60Sep-20066.00.05472.021.513.0fINDIVIDUAL0.00.0823 Reid Ford\r\nDelacruzside, MA 00813
424375.060 months17.27609.33CC5Destiny Management Inc.9 yearsMORTGAGE55000.0VerifiedApr-2013Charged Offcredit_cardCredit Card Refinance33.95Mar-199913.00.024584.069.843.0fINDIVIDUAL1.00.0679 Luna Roads\r\nGreggshire, VA 11650
520000.036 months13.33677.07CC3HR Specialist10+ yearsMORTGAGE86788.0VerifiedSep-2015Fully Paiddebt_consolidationDebt consolidation16.31Jan-20058.00.025757.0100.623.0fINDIVIDUAL4.00.01726 Cooper Passage Suite 129\r\nNorth Deniseberg, DE 30723
618000.036 months5.32542.07AA1Software Development Engineer2 yearsMORTGAGE125000.0Source VerifiedSep-2015Fully Paidhome_improvementHome improvement1.36Aug-20058.00.04178.04.925.0fINDIVIDUAL3.00.01008 Erika Vista Suite 748\r\nEast Stephanie, TX 22690
713000.036 months11.14426.47BB2Office Depot10+ yearsRENT46000.0Not VerifiedSep-2012Fully Paidcredit_cardNo More Credit Cards26.87Sep-199411.00.013425.064.515.0fINDIVIDUAL0.00.0USCGC Nunez\r\nFPO AE 30723
818900.060 months10.99410.84BB3Application Architect10+ yearsRENT103000.0VerifiedOct-2014Fully Paiddebt_consolidationDebt consolidation12.52Jun-199413.00.018637.032.940.0wINDIVIDUAL3.00.0USCGC Tran\r\nFPO AP 22690
926300.036 months16.29928.40CC5Regado Biosciences3 yearsMORTGAGE115000.0VerifiedApr-2012Fully Paiddebt_consolidationDebt Consolidation23.69Dec-199713.00.022171.082.437.0fINDIVIDUAL1.00.03390 Luis Rue\r\nMauricestad, VA 00813

Last rows

loan_amnttermint_rateinstallmentgradesub_gradeemp_titleemp_lengthhome_ownershipannual_incverification_statusissue_dloan_statuspurposetitledtiearliest_cr_lineopen_accpub_recrevol_balrevol_utiltotal_accinitial_list_statusapplication_typemort_accpub_rec_bankruptciesaddress
39602010000.036 months9.76321.55BB3Retirement Counselor10+ yearsRENT40000.0Not VerifiedDec-2015Fully Paiddebt_consolidationDebt consolidation23.40Jan-19889.00.08819.057.318.0wINDIVIDUAL1.00.0914 Alexander Mountains Apt. 604\r\nEast Marco, VT 70466
3960213200.036 months5.4296.52AA1St Francis Medical Center10+ yearsRENT33000.0Not VerifiedFeb-2011Fully Paiddebt_consolidation2011 Insurance and Debt Consolidation21.45Nov-199618.00.03985.07.650.0fINDIVIDUALNaN0.0309 John Mission\r\nWest Marc, NY 00813
39602212000.036 months12.29400.24CC1Data Center Specialist II1 yearRENT52100.0Source VerifiedOct-2015Fully Paiddebt_consolidationDebt consolidation17.28Oct-20046.00.09580.066.118.0wINDIVIDUAL0.00.0532 Johnson Drive Apt. 185\r\nAndersonside, NY 70466
39602322000.036 months18.92805.55DD4Operations Manager10+ yearsMORTGAGE138000.0Not VerifiedApr-2014Fully Paiddebt_consolidationDebt consolidation24.43May-199818.00.022287.050.439.0fINDIVIDUAL4.00.00297 Flores Dale Suite 441\r\nTaylorland, MD 05113
3960246000.036 months13.11202.49BB4Michael's Arts & Crafts5 yearsRENT64000.0Not VerifiedMar-2013Fully Paiddebt_consolidationCredit buster10.81Nov-19917.00.011456.097.19.0wINDIVIDUAL0.00.0514 Cynthia Park Apt. 402\r\nWest Williamside, SC 05113
39602510000.060 months10.99217.38BB4licensed bankere2 yearsRENT40000.0Source VerifiedOct-2015Fully Paiddebt_consolidationDebt consolidation15.63Nov-20046.00.01990.034.323.0wINDIVIDUAL0.00.012951 Williams Crossing\r\nJohnnyville, DC 30723
39602621000.036 months12.29700.42CC1Agent5 yearsMORTGAGE110000.0Source VerifiedFeb-2015Fully Paiddebt_consolidationDebt consolidation21.45Feb-20066.00.043263.095.78.0fINDIVIDUAL1.00.00114 Fowler Field Suite 028\r\nRachelborough, LA 05113
3960275000.036 months9.99161.32BB1City Carrier10+ yearsRENT56500.0VerifiedOct-2013Fully Paiddebt_consolidationpay off credit cards17.56Mar-199715.00.032704.066.923.0fINDIVIDUAL0.00.0953 Matthew Points Suite 414\r\nReedfort, NY 70466
39602821000.060 months15.31503.02CC2Gracon Services, Inc10+ yearsMORTGAGE64000.0VerifiedAug-2012Fully Paiddebt_consolidationLoanforpayoff15.88Nov-19909.00.015704.053.820.0fINDIVIDUAL5.00.07843 Blake Freeway Apt. 229\r\nNew Michael, FL 29597
3960292000.036 months13.6167.98CC2Internal Revenue Service10+ yearsRENT42996.0VerifiedJun-2010Fully Paiddebt_consolidationToxic Debt Payoff8.32Sep-19983.00.04292.091.319.0fINDIVIDUALNaN0.0787 Michelle Causeway\r\nBriannaton, AR 48052